This work aims to develop sets of tools and procedures that will help trader to efficiently guess Financial Asset Market Type. Whether it’s a Bullish or Bearish Market, Ranging, Volatile - Author believes that having this ability to detect market type and use it in trading would be of a great advantage!
In order to test the approach secondary method will be developed. That is to identify best ‘entry’ pattern.
Whenever this attempt fails, there would still be learning left! Reader would be still capable to know how to Use Deep Learning for Regression/Classification problems
Basic idea of achieveing this will be:
Note: use R script attached to this repository called: 6_h2o_Install.R to install h2o on your computer
library(tidyverse)
library(lubridate)
library(plotly)
library(h2o)
Here we can get into the financial data. Financial data can be read and refreshed using MQL side. How to do this is explained in the Udemy course. For the reproducibility purposes and for those who are not planning to trade sample data is available in the repository:
# use this option to use sample data:
macd <- read_csv("AI_Macd15.csv", col_names = F)
Parsed with column specification:
cols(
.default = col_double(),
X1 = col_character()
)
See spec(...) for full column specifications.
Here we would need to manually change Y variable in the plot until finding siutable market condition…
This is ‘manual’ part of things with a chance of bias. Fellow reader can certainly create custom functions to select periods automatically. Author of this text is too lazy to do that and trust more to the personal brain to do so
Code below will create time-series plot of one currency pair
We will extract only corresponding piece in this case starting from November’2017…
# extract approximate date and choose only relevant columns
bull_norm <- prices %>% filter(X1 > "2017-11-05", X1 < "2017-11-25") %>% select(X1, X3)
… and visualize it to confirm
next, we can extract corresponding piece of macd dataframe:
macd_bull_norm <- macd %>% select(X1, X3) %>% inner_join(bull_norm, by = c("X1" = "X1"))
and visualize both things together
let’s now use this function:
macd_m_bull_norm <- macd_bull_norm %>% select(X3.x) %>% to_m(100)
…to convert this dataset to the matrix with 100 columns
and now we can see the obtained surface as 3D plot. In this case we have 14 rows. Each of these rows will contain 100 datapoints Tip: try to rotate obtained object and notice that majority of points are located in the positive area. Of course there were also ‘corrections’ hence some rows are in the negative side…
Brief explanation is probably required. Why do we use 100 datapoints (or less) in one row? The meaning of those will be to give use the pattern or fingerprint of that one specific market period. The goal of our model later will be exactly this. To digest the last observations through the model and output the value or category hence recognizing what is the specific market type…
Let’s however make some more considerations. Why don’t we say to guess our market type decision on the last 8 hours hence 8*60/15 which will result 32 M15 bars:
We have seen that the majority of our observations are in the ‘positive’ area, however some of the observations do not! For that reason it would be much better perhaps remove those observations that are not up to our pattern! Why not to try Deep Learning Autoencoders?
The key idea now will be to train Deep Learning autoencoder model on that selected dataset. We will ‘send’ this dataset to our JVM:
# load data into h2o environment
macd_bv <- as.h2o(x = macd_m_bull_norm, destination_frame = "macd_bull_norm")
|
| | 0%
|
|=======================================================================================================================================================| 100%
Then we will fit the model:
# fit the model
deepnet_model <- h2o.deeplearning(
x = names(macd_bv),
training_frame = macd_bv,
activation = "Tanh",
autoencoder = TRUE,
hidden = c(20,8,20),
sparse = TRUE,
l1 = 1e-4,
epochs = 100)
|
| | 0%
|
|========================================================================================================== | 70%
|
|=======================================================================================================================================================| 100%
Any time need to make a pause?
h2o.shutdown(prompt = F)
[1] TRUE
We can now use this model to extract anomalous records. Records that would not be corresponding to our ‘bullish’ pattern will have higher mse value, for example:
We now can find indexes of observations where mse error is higher than 0.005
Finally let’s try to see if we could now filter the outliers:
8 observations were filtered…
Well it seems that we could filter our observations but not completely. Besides, it is probably questionable if we should do that in the first place…
Next we can create a code that will help to select data for every market type and combine that to one specific dataframe
Now we have our labelled dataset
Next we can fit the model just by specifying what are the ‘Label’ column. In this case our label is a numeric.
Model we will fit in this case will have configuration:
| Inputs | hidden layer1 | hidden layer2 | Output |
|---|---|---|---|
| 32 | 100 | 100 | 1 |
#### Fitting Deep Learning Net =================================================
## Fit model now:
# start h2o virtual machine
h2o.init()
Connection successful!
R is connected to the H2O cluster:
H2O cluster uptime: 1 hours 5 minutes
H2O cluster timezone: Europe/Berlin
H2O data parsing timezone: UTC
H2O cluster version: 3.18.0.4
H2O cluster version age: 20 days
H2O cluster name: H2O_started_from_R_fxtrams_nux062
H2O cluster total nodes: 1
H2O cluster total memory: 1.60 GB
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster healthy: TRUE
H2O Connection ip: localhost
H2O Connection port: 54321
H2O Connection proxy: NA
H2O Internal Security: FALSE
H2O API Extensions: Algos, AutoML, Core V3, Core V4
R Version: R version 3.4.3 (2017-11-30)
It takes a while until it is trained:
ModelA
summary(ModelA)
h2o.performance(ModelA)
Notice that model is making mistakes sometimes even returning negative values!
In fact we should try different models until results are better while choosing less possible complexity model, for example:
# fit models from simplest to more complex
ModelB <- h2o.deeplearning(
x = names(macd_ML[,1:32]),
y = "M_T",
training_frame = macd_ML,
activation = "Tanh",
overwrite_with_best_model = TRUE,
autoencoder = FALSE,
hidden = c(30,20,30),
loss = "Automatic",
sparse = TRUE,
l1 = 1e-4,
distribution = "AUTO",
stopping_metric = "MSE",
#balance_classes = T,
epochs = 600)
|
| | 0%
|
|== | 2%
|
|============================================ | 33%
|
|==================================================================================== | 63%
|
|============================================================================================================================= | 95%
|
|====================================================================================================================================| 100%
h2o.performance(ModelB)
H2ORegressionMetrics: deeplearning
** Reported on training data. **
** Metrics reported on full training frame **
MSE: 77.40097
RMSE: 8.797782
MAE: 6.257931
RMSLE: NaN
Mean Residual Deviance : 77.40097
# fit models from simplest to more complex
ModelC <- h2o.deeplearning(
x = names(macd_ML[,1:32]),
y = "M_T",
training_frame = macd_ML,
activation = "Tanh",
overwrite_with_best_model = TRUE,
autoencoder = FALSE,
hidden = c(30,30),
loss = "Automatic",
sparse = TRUE,
l1 = 1e-4,
distribution = "AUTO",
stopping_metric = "MSE",
#balance_classes = T,
epochs = 600)
|
| | 0%
|
|============= | 10%
|
|==================================================================== | 52%
|
|========================================================================================================================= | 92%
|
|====================================================================================================================================| 100%
h2o.performance(ModelC)
H2ORegressionMetrics: deeplearning
** Reported on training data. **
** Metrics reported on full training frame **
MSE: 210.7035
RMSE: 14.51563
MAE: 11.72863
RMSLE: NaN
Mean Residual Deviance : 210.7035
# fit models from simplest to more complex
ModelD <- h2o.deeplearning(
x = names(macd_ML[,1:32]),
y = "M_T",
training_frame = macd_ML,
activation = "Tanh",
overwrite_with_best_model = TRUE,
autoencoder = FALSE,
hidden = c(200,100,200),
loss = "Automatic",
sparse = TRUE,
l1 = 1e-4,
distribution = "AUTO",
stopping_metric = "MSE",
#balance_classes = T,
epochs = 600)
|
| | 0%
|
|========= | 7%
|
|============= | 10%
|
|================== | 13%
|
|====================== | 17%
|
|========================== | 20%
|
|=============================== | 23%
|
|=================================== | 27%
|
|========================================== | 32%
|
|============================================== | 35%
|
|===================================================== | 40%
|
|========================================================= | 43%
|
|============================================================== | 47%
|
|================================================================ | 48%
|
|==================================================================== | 52%
|
|========================================================================= | 55%
|
|============================================================================= | 58%
|
|==================================================================================== | 63%
|
|====================================================================================== | 65%
|
|========================================================================================== | 68%
|
|=============================================================================================== | 72%
|
|=================================================================================================== | 75%
|
|======================================================================================================= | 78%
|
|============================================================================================================ | 82%
|
|================================================================================================================ | 85%
|
|===================================================================================================================== | 88%
|
|========================================================================================================================= | 92%
|
|============================================================================================================================= | 95%
|
|================================================================================================================================== | 98%
|
|====================================================================================================================================| 100%
h2o.performance(ModelD)
H2ORegressionMetrics: deeplearning
** Reported on training data. **
** Metrics reported on full training frame **
MSE: 65.23583
RMSE: 8.07687
MAE: 5.380744
RMSLE: 0.2714364
Mean Residual Deviance : 65.23583
We can also try other parameters in Model A for example computing variables importances (not relevant for this task)
# fit models from simplest to more complex
ModelA <- h2o.deeplearning(
x = names(macd_ML[,1:32]),
y = "M_T",
training_frame = macd_ML,
activation = "Tanh",
overwrite_with_best_model = TRUE,
autoencoder = FALSE,
hidden = c(100,100),
loss = "Automatic",
sparse = TRUE,
l1 = 1e-4,
distribution = "AUTO",
stopping_metric = "MSE",
#balance_classes = T,
variable_importances = T,
epochs = 600)
|
| | 0%
|
|======= | 5%
|
|================== | 13%
|
|========================== | 20%
|
|=================================== | 27%
|
|============================================ | 33%
|
|===================================================== | 40%
|
|====================================================================== | 53%
|
|======================================================================================== | 67%
|
|===================================================================================================== | 77%
|
|=========================================================================================================================== | 93%
|
|====================================================================================================================================| 100%
hyper_params <- list(
activation=c("Rectifier","Tanh","Maxout","RectifierWithDropout","TanhWithDropout","MaxoutWithDropout"),
hidden=list(c(100,100),c(50,50),c(30,30,30),c(25,25,25,25)),
input_dropout_ratio=c(0,0.05),
l1=seq(0,1e-4,1e-6),
l2=seq(0,1e-4,1e-6)
)
This is the method of finding the best model running it at once: see https://github.com/h2oai/h2o-tutorials/tree/master/tutorials/deeplearning
## Stop once the top 5 models are within 1% of each other (i.e., the windowed average varies less than 1%)
search_criteria = list(strategy = "RandomDiscrete", max_runtime_secs = 360, max_models = 100, seed=1234567, stopping_rounds=5, stopping_tolerance=1e-2)
dl_random_grid <- h2o.grid(
algorithm="deeplearning",
#grid_id = "dl_grid_random",
training_frame=macd_ML,
x=names(macd_ML[,1:32]),
y="M_T",
epochs=1,
stopping_metric="MSE",
stopping_tolerance=1e-2, ## stop when logloss does not improve by >=1% for 2 scoring events
stopping_rounds=2,
score_validation_samples=10000, ## downsample validation set for faster scoring
score_duty_cycle=0.025, ## don't score more than 2.5% of the wall time
max_w2=10, ## can help improve stability for Rectifier
hyper_params = hyper_params,
search_criteria = search_criteria
)
grid <- h2o.getGrid("dl_grid_random",sort_by="MSE",decreasing=FALSE)
grid
… need time to study this result, isn’t it???
Let’s assume we have the latest information… what will the model will say?
plot(macd_latest[1, 1:32])
Error in plot.new() : figure margins too large
In case our model will have a categorical variable as a label we can attempt to go for classification modelling
I would then re-create dataset by transforming to categorical variables
ModelCA <- h2o.deeplearning(
x = names(macd_Cat[,1:32]),
y = "M_T",
training_frame = macd_Cat,
activation = "Tanh",
overwrite_with_best_model = TRUE,
autoencoder = FALSE,
hidden = c(100,100),
loss = "Automatic",
sparse = TRUE,
l1 = 1e-4,
distribution = "AUTO",
stopping_metric = "AUTO",
#balance_classes = T,
#variable_importances = T,
epochs = 600)
|
| | 0%
|
|==== | 3%
|
|==================== | 15%
|
|=============================== | 23%
|
|========================================== | 32%
|
|======================================================= | 42%
|
|==================================================================== | 52%
|
|=============================================================================== | 60%
|
|=============================================================================================== | 72%
|
|============================================================================================================ | 82%
|
|=========================================================================================================================== | 93%
|
|====================================================================================================================================| 100%
summary(ModelCA)
Model Details:
==============
H2OMultinomialModel: deeplearning
Model Key: DeepLearning_model_R_1522269773579_1
Status of Neuron Layers: predicting M_T, 6-class classification, multinomial distribution, CrossEntropy loss, 14,006 weights/biases, 172.7 KB, 183,000 training samples, mini-batch size 1
layer units type dropout l1 l2 mean_rate rate_rms momentum mean_weight weight_rms mean_bias bias_rms
1 1 32 Input 0.00 %
2 2 100 Tanh 0.00 % 0.000100 0.000000 0.051910 0.176526 0.000000 0.013192 0.222632 0.017851 0.306225
3 3 100 Tanh 0.00 % 0.000100 0.000000 0.038969 0.029415 0.000000 0.001196 0.147001 0.017889 0.385062
4 4 6 Softmax 0.000100 0.000000 0.069184 0.091559 0.000000 -0.016573 0.853892 -2.322938 1.100623
H2OMultinomialMetrics: deeplearning
** Reported on training data. **
** Metrics reported on full training frame **
Training Set Metrics:
=====================
Extract training frame with `h2o.getFrame("macd_Cat")`
MSE: (Extract with `h2o.mse`) 0.1307116
RMSE: (Extract with `h2o.rmse`) 0.3615405
Logloss: (Extract with `h2o.logloss`) 0.4384951
Mean Per-Class Error: 0.3162548
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=========================================================================
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
five four one six tree two Error Rate
five 65 0 3 3 0 0 0.0845 = 6 / 71
four 4 8 2 6 4 0 0.6667 = 16 / 24
one 1 0 41 1 1 0 0.0682 = 3 / 44
six 0 0 5 78 4 0 0.1034 = 9 / 87
tree 0 0 2 1 62 0 0.0462 = 3 / 65
two 2 0 3 7 1 1 0.9286 = 13 / 14
Totals 72 8 56 96 72 1 0.1639 = 50 / 305
Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=======================================================================
Top-6 Hit Ratios:
k hit_ratio
1 1 0.836066
2 2 0.931148
3 3 0.977049
4 4 0.993443
5 5 1.000000
6 6 1.000000
Scoring History:
timestamp duration training_speed epochs iterations samples training_rmse training_logloss
1 2018-03-28 22:47:48 0.000 sec 0.00000 0 0.000000
2 2018-03-28 22:47:48 0.768 sec 6434 obs/sec 10.00000 1 3050.000000 0.87105 4.56768
3 2018-03-28 22:47:54 5.818 sec 11062 obs/sec 200.00000 20 61000.000000 0.52269 1.10962
4 2018-03-28 22:47:59 10.977 sec 12020 obs/sec 420.00000 42 128100.000000 0.51351 0.85628
5 2018-03-28 22:48:03 15.158 sec 12339 obs/sec 600.00000 60 183000.000000 0.36154 0.43850
training_classification_error
1
2 0.80656
3 0.33770
4 0.32787
5 0.16393
Variable Importances: (Extract with `h2o.varimp`)
=================================================
Variable Importances:
variable relative_importance scaled_importance percentage
1 X32 1.000000 1.000000 0.062053
2 X2 0.939487 0.939487 0.058298
3 X3 0.784664 0.784664 0.048691
4 X31 0.654149 0.654149 0.040592
5 X4 0.597804 0.597804 0.037095
---
variable relative_importance scaled_importance percentage
27 X14 0.396537 0.396537 0.024606
28 X29 0.391979 0.391979 0.024323
29 X22 0.370001 0.370001 0.022960
30 X7 0.356184 0.356184 0.022102
31 X1 0.348629 0.348629 0.021633
32 X10 0.339044 0.339044 0.021039
Let’s assume we have the latest information… what will the model will say?
macd_latest <- macd_ML2[200, NA] #label = five
Error in `[.data.frame`(macd_ML2, 200, NA) : undefined columns selected
pred200$predict
[1] five
Levels: five
# more trials
ModelCA1 <- h2o.deeplearning(
x = names(macd_Cat[,1:32]),
y = "M_T",
training_frame = macd_Cat,
activation = "Tanh",
overwrite_with_best_model = TRUE,
autoencoder = FALSE,
hidden = c(200,200),
loss = "Automatic",
sparse = TRUE,
l1 = 1e-4,
distribution = "AUTO",
stopping_metric = "AUTO",
#balance_classes = T,
#variable_importances = T,
epochs = 600)
|
| | 0%
|
|==== | 3%
|
|========= | 7%
|
|=============== | 12%
|
|==================== | 15%
|
|======================== | 18%
|
|============================= | 22%
|
|================================= | 25%
|
|===================================== | 28%
|
|========================================== | 32%
|
|============================================== | 35%
|
|=================================================== | 38%
|
|======================================================= | 42%
|
|=========================================================== | 45%
|
|================================================================ | 48%
|
|==================================================================== | 52%
|
|========================================================================= | 55%
|
|============================================================================= | 58%
|
|================================================================================= | 62%
|
|====================================================================================== | 65%
|
|========================================================================================== | 68%
|
|=============================================================================================== | 72%
|
|======================================================================================================= | 78%
|
|============================================================================================================ | 82%
|
|================================================================================================================ | 85%
|
|===================================================================================================================== | 88%
|
|========================================================================================================================= | 92%
|
|============================================================================================================================= | 95%
|
|================================================================================================================================== | 98%
|
|====================================================================================================================================| 100%
summary(ModelCA1)
Model Details:
==============
H2OMultinomialModel: deeplearning
Model Key: DeepLearning_model_R_1522269773579_5
Status of Neuron Layers: predicting M_T, 6-class classification, multinomial distribution, CrossEntropy loss, 48,006 weights/biases, 573.5 KB, 183,000 training samples, mini-batch size 1
layer units type dropout l1 l2 mean_rate rate_rms momentum mean_weight weight_rms mean_bias bias_rms
1 1 32 Input 0.00 %
2 2 200 Tanh 0.00 % 0.000100 0.000000 0.097310 0.202946 0.000000 0.000910 0.164691 0.012524 0.209276
3 3 200 Tanh 0.00 % 0.000100 0.000000 0.105784 0.181197 0.000000 -0.000612 0.070102 0.005249 0.422971
4 4 6 Softmax 0.000100 0.000000 0.119109 0.194559 0.000000 -0.016467 0.564256 -2.093254 1.284491
H2OMultinomialMetrics: deeplearning
** Reported on training data. **
** Metrics reported on full training frame **
Training Set Metrics:
=====================
Extract training frame with `h2o.getFrame("macd_Cat")`
MSE: (Extract with `h2o.mse`) 0.1719866
RMSE: (Extract with `h2o.rmse`) 0.4147127
Logloss: (Extract with `h2o.logloss`) 0.629889
Mean Per-Class Error: 0.3891719
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=========================================================================
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
five four one six tree two Error Rate
five 63 0 6 2 0 0 0.1127 = 8 / 71
four 3 1 7 6 7 0 0.9583 = 23 / 24
one 2 0 42 0 0 0 0.0455 = 2 / 44
six 0 0 7 72 8 0 0.1724 = 15 / 87
tree 1 0 2 0 62 0 0.0462 = 3 / 65
two 2 0 4 7 1 0 1.0000 = 14 / 14
Totals 71 1 68 87 78 0 0.2131 = 65 / 305
Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=======================================================================
Top-6 Hit Ratios:
k hit_ratio
1 1 0.786885
2 2 0.901639
3 3 0.960656
4 4 0.996721
5 5 1.000000
6 6 1.000000
Scoring History:
timestamp duration training_speed epochs iterations samples training_rmse training_logloss
1 2018-03-28 23:03:49 0.000 sec 0.00000 0 0.000000
2 2018-03-28 23:03:50 0.767 sec 4265 obs/sec 10.00000 1 3050.000000 0.89639 5.61147
3 2018-03-28 23:03:55 5.934 sec 4677 obs/sec 90.00000 9 27450.000000 0.76439 2.72534
4 2018-03-28 23:04:01 11.223 sec 4652 obs/sec 170.00000 17 51850.000000 0.58810 1.61467
5 2018-03-28 23:04:06 16.510 sec 4644 obs/sec 250.00000 25 76250.000000 0.54758 1.10819
6 2018-03-28 23:04:11 21.737 sec 4652 obs/sec 330.00000 33 100650.000000 0.45845 0.90001
7 2018-03-28 23:04:16 26.906 sec 4668 obs/sec 410.00000 41 125050.000000 0.56910 1.14495
8 2018-03-28 23:04:22 32.252 sec 4652 obs/sec 490.00000 49 149450.000000 0.42358 0.69182
9 2018-03-28 23:04:27 37.521 sec 4651 obs/sec 570.00000 57 173850.000000 0.41471 0.62989
10 2018-03-28 23:04:29 39.556 sec 4645 obs/sec 600.00000 60 183000.000000 0.59947 1.43570
11 2018-03-28 23:04:29 39.572 sec 4645 obs/sec 600.00000 60 183000.000000 0.41471 0.62989
training_classification_error
1
2 0.82623
3 0.64262
4 0.40000
5 0.36393
6 0.23934
7 0.37377
8 0.20000
9 0.21311
10 0.40656
11 0.21311
Variable Importances: (Extract with `h2o.varimp`)
=================================================
Variable Importances:
variable relative_importance scaled_importance percentage
1 X32 1.000000 1.000000 0.063955
2 X2 0.974053 0.974053 0.062295
3 X3 0.696651 0.696651 0.044554
4 X1 0.636150 0.636150 0.040685
5 X4 0.605534 0.605534 0.038727
---
variable relative_importance scaled_importance percentage
27 X5 0.357844 0.357844 0.022886
28 X22 0.357326 0.357326 0.022853
29 X30 0.354696 0.354696 0.022685
30 X13 0.353810 0.353810 0.022628
31 X29 0.329508 0.329508 0.021074
32 X27 0.325776 0.325776 0.020835
There are pretty different results… potentially the error rate is high!
h2o.performance(ModelCA1)
H2OMultinomialMetrics: deeplearning
** Reported on training data. **
** Metrics reported on full training frame **
Training Set Metrics:
=====================
Extract training frame with `h2o.getFrame("macd_Cat")`
MSE: (Extract with `h2o.mse`) 0.1719866
RMSE: (Extract with `h2o.rmse`) 0.4147127
Logloss: (Extract with `h2o.logloss`) 0.629889
Mean Per-Class Error: 0.3891719
Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
=========================================================================
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
five four one six tree two Error Rate
five 63 0 6 2 0 0 0.1127 = 8 / 71
four 3 1 7 6 7 0 0.9583 = 23 / 24
one 2 0 42 0 0 0 0.0455 = 2 / 44
six 0 0 7 72 8 0 0.1724 = 15 / 87
tree 1 0 2 0 62 0 0.0462 = 3 / 65
two 2 0 4 7 1 0 1.0000 = 14 / 14
Totals 71 1 68 87 78 0 0.2131 = 65 / 305
Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
=======================================================================
Top-6 Hit Ratios:
k hit_ratio
1 1 0.786885
2 2 0.901639
3 3 0.960656
4 4 0.996721
5 5 1.000000
6 6 1.000000
In case we want to save the model persistently we can do so
# shutdown...
h2o.shutdown(prompt = F)
[1] TRUE
This procedure can be repeated for every market period… and hopefully it will bring some fruits…?
# Function converting time series data to matrix
to_m <- function(x, n_cols) {
### PURPOSE: Transform Time Series Column of the dataframe to the matrix
# with specified number of columns. Number of rows will be automatically
# found and remaining data points discarded
# # Uncomment variable to debug function
# x -< dataframe with one column
# x <- DF_TEMP
# n_cols <- 150
# get intermediate object and dimension
Step1 <- x
# find number of rows of data frame
nrows <- Step1 %>% nrow()
# find the number of row in a matrix (Whole Rows), the value will have decimals...
WN <- nrows/n_cols
## extract the whole number uncomment for debug/test
# WN <- 19.2
# WN <- 19.8
if((WN - round(WN)) < 0){WN <- round(WN) - 1} else {WN <- round(WN)}
# find number of rows to extract data
n <- n_cols * WN
# extract relevant matrix
Step2 <- Step1 %>%
head(n) %>% #only use whole number to avoid errors
t() %>% # this brings us a matrix
matrix(nrow = WN, ncol = n_cols, byrow = TRUE) # transforming that into matrix size 20x150
# return the result of the function
return(Step2)
}
See www.h2o.ai –> Download –> Install from R